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Abstract 

Dynamie ensemble seleetion (DES) teehniques work by estimating the level of eompetenee of 
eaeh elassifier from a pool of elassifiers. Only the most eompetent ones are seleeted to elassify 
a given test sample. Henee, the key issue in DES is the eriterion used to estimate the level of 
eompetenee of the elassifiers in predieting the label of a given test sample. In order to perform 
a more robust ensemble seleetion, we proposed the META-DES framework using meta-learning, 
where multiple eriteria are eneoded as meta-features and are passed down to a meta-elassifier 
that is trained to estimate the eompetenee level of a given elassifier. In this teehnieal report, 
we present a step-by-step analysis of eaeh phase of the framework during training and test. We 
show how eaeh set of meta-features is extraeted as well as their impaet on the estimation of the 
eompetenee level of the base elassifier. Moreover, an analysis of the impaet of several faetors 
in the system performanee, sueh as the number of elassifiers in the pool, the use of different 
linear base elassifiers, as well as the size of the validation data. We show that using the dynamie 
seleetion of linear elassifiers through the META-DES framework, we ean solve eomplex non-linear 
elassifieation problems where other eombination teehniques sueh as AdaBoost eannot. 
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1. Introduction 


Multiple Classifier Systems (MCS) aim to eombine elassifiers in order to inerease the reeog- 
nition aeeuraey in pattern reeognition systems [1, 2]. MCS are eomposed of three phases [3]: (1) 
Generation, (2) Seleetion, and (3) Integration. In the first phase, a pool of elassifiers is generated. 
In the seeond phase, a single elassifier or a subset having the best elassifiers of the pool is(are) se- 
leeted. We refer to the subset of elassifiers as the Ensemble of Classifiers (EoC). In the last phase, 
integration, the predietions of the seleeted elassifiers are eombined to obtain the final deeision [1]. 

Reeent works in MCS have shown that dynamie ensemble seleetion (DES) teehniques aehieve 
higher elassifieation aeeuraey when eompared to statie ones [3, 4, 5]. This is speeially true for 
ill-defined problems, i.e., for problems where the size of the training data is small, and there 
are not enough data available to train the elassifiers [6, 7]. The key issue in DES is to define a 
eriterion to measure the level of competenee of a base elassifier. Most DES teehniques [5, 8,9, 10] 
estimate the elassifiers’ loeal aeeuraey in small regions of the feature spaee surrounding the query 
instanee, ealled the region of eompetenee, as a seareh eriterion for estimating the eompetenee level 
of the base elassifier. However, in our previous work [10], we demonstrated that the use of loeal 
aeeuraey estimates alone is insuffieient to provide higher elassifieation performanee. In addition, a 
dissimilarity analysis among eight dynamie seleetion teehniques, performed in [11], indieates that 
teehniques based on different eriteria for estimating the eompetenee level of base elassifiers yields 
different results. 

To taekle this issue, in [4] we proposed a novel DES framework, ealled META-DES, in whieh 
multiple eriteria regarding the behavior of a base elassifier are used to eompute its level of eom- 
petenee. The framework is based on two environments: the elassifieation environment, in whieh 
the input features are mapped into a set of elass labels, and the meta-elassifieation environment, 
where several properties from the elassifieation environment, sueh as the elassifier aeeuraey in a 
loeal region of the feature spaee, are extraeted from the training data and eneoded as meta-features. 
Given a test data, the meta-features are extraeted using the test data as referenee, and used as input 
to the meta-elassifier. The meta-elassifier deeides whether the base elassifier is eompetent enough 
to elassify the test sample. 

One interesting properties of the META-DES framework is that it obtains higher elassifieation 
aeeuraey using only linear elassifiers. In this work, we perform a deep analysis of the training and 
elassifieation steps of the META-DES framework. We perform step-by-step examples in order 
to show the influenee of different sets of meta-features used to better estimate the eompetenee of 
the base elassifier. The analysis is eondueted using the P2 problem [12, 13] whieh is a two-elass 
non-linear problem with a eomplex deeision boundary. Eurthermore, the two-elasses of the P2 
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problem have multiple class means, making it a difficult classification problem. 

The following points of the META-DES framework are studied: 

• The use of weak, linear classifiers in the pool. In this work we consider both Perceptrons 
and Decision Stumps as base classifiers. 

• The influence of each set of meta-features for estimating the competence of a base classifier. 

• The influence of the dynamic selection set (DSEE)^ in the recognition rate. The dynamic 
selection data is used in order to extract the meta-features. 

• The influence of the size of the Pool in the classification accuracy of the META-DES frame¬ 
work. 

The contributions of this work are as follows: 

• It shows that using dynamic selection of linear and weak classifiers, such as Perceptrons 
and Decision stumps, we can solve problems with complex decision boundaries, including 
classification problems with multiple class centers. 

• It allows an understanding of why the META-DES framework achieves high recognition 
accuracy using only linear classifiers. In previous, works the META-DES was presented 
as a black box system. In this work, we use a step-by-step example to illustrate how the 
framework is able to select the competent classifiers based on the five defined sets of meta¬ 
features. 

• It compares the dynamic selection of linear classifiers against static combination rules such 
as AdaBoost, as well as classical single classifier models, such as Multi-Eayer Perceptron 
neural networks. Random Eorest, and Support Vector Machnies (SVMs). 

This document is organized as follows. Theoretical aspects of dynamic selection are introduced 
in Section 2. The META-DES framework is presented in Section 3. An illustrative example of the 
META-DES is presented in Section 4. Experiments are carried out in Section 5. Conclusions are 
given in the last section. 


'DSEL is often called validation data in several dynamic selection techniques. 
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2. Why does dynamic selection of linear classifiers work? 

Let C = {M is the size of the pool of classifiers) be the pool of classifiers and Cj 

a base classifier belonging to the pool C. The goal of dynamic selection is to find an ensemble of 
classifiers C <Z C that has the best classifiers to classify a given test sample x^. DBS techniques 
rely on the assumption that each base classifier is an expert in a different local region of the feature 
space [14]. Only the classifiers that attain a certain competence level, according to a selection 
criterion, are selected to predict the label of Xj. This is a different strategy when compared with 
static selection, where the ensemble of classifiers C is selected during the training phase, and 
considering the global performance of the base classifiers over a validation dataset [15,16,17,18]. 

When dealing with dynamic selection, we aim to select the appropriate classifier(s) for a spe¬ 
cific test sample x^, rather than find the best decision border separating the classes. This is a dif¬ 
ferent concept, as compared to classical classification models, such as Support Vector Machines 
(SVM) or Multi-Layer Perceptrons (MLP) Neural Networks in the sense that these classifiers 
search for the best separation between the classes during the training stages. This is an important 
property of dynamic selection techniques, which makes them suitable for solving problems that 
are ill-defined, i.e., when there is not enough data available to train a strong classifier having a 
lot of parameters to learn [6]. In addition, due to insufficient training data, the distribution of the 
training data may not adequately represent the real distribution of the problem. Consequently, the 
classifiers cannot learn the separation between the classes. 

Let us consider, for instance, two circles representing the exclusive or XOR problem. The 
problem is generated with 1000 data points, 500 for each class (Ligure 1 (a)). Two linear classifiers 
trained for this problem (two Perceptrons) Ci and C 2 , both with an individual accuracy of 50%. The 
decisions of ci and C 2 are shown in (Ligure 1 (b) and (c) respectively). 


4 




(a) Two circles representing the XOR problem 


Perceptron 1 Perceptron 2 



Figure 1; (a) The two circles data generated with 1000 data points, 500 samples for each class; (b) illustrates the 
decision made by the Perceptron ci; (c) shows the decision made by the Perceptron C 2 . 

Static combination rules, such as majority voting or averaging are useless in this ease sinee the 
base elassifiers always yield opposite deeisions, i.e., for any query sample Xj, if ci prediets that 
Xj belongs to elass 1, C 2 will prediet that x^ belongs to elass 2 and viee versa. There is never a 
eonsensus between the deeisions obtained by these two elassifiers. 

Considering the same data, it is possible to split the feature spaee into four loeal regions (Fig¬ 
ure 2): Ql, Q2, Q3 and Q4. 

Using dynamie seleetion, it is possible to obtain a 100% aeeuraey rate using only these two 
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Figure 2: The two circles data divided into four regions. 


classifiers. Given a query instanee x^, the system first eheeks the eompetenee of eaeh elassifier in 
the pool. Only the classifier(s) with the highest eompetenee are seleeted. Classifiers that are not 
experts in the loeal region will not influenee the ensemble deeision. 

Given a query sample to be elassified, using dynamie seleetion, the elassifieation is per¬ 
formed as follows(Equation 1): 

'ifxj G Q1 
If X, G Q2 
If Xj G Q3 
If X, G Q4 

The key issue in DES is to define a eriterion to measure the level of eompetenee of a base 
classifier. Most DES teehniques [5, 8, 9, 10, 19, 20, 21, 22] use estimates of the elassifiers’ loeal 
aeeuraey in small regions of the feature spaee surrounding the query instanee as a seareh erite¬ 
rion to perform the ensemble seleetion. There are other eriteria, sueh as the degree of eonsensus, 
in the ensemble [23], probabilistie models applied to the elassifier outputs [24] and decision tem¬ 
plates [6, 7]. A reeent survey on dynamie seleetion [3] eovers all the DES eriteria used by different 
techniques. 

In [4, 25], we proposed a novel DES framework in which multiple criteria regarding the be¬ 
havior of a base elassifier are used to have a better estimation of its level of eompetenee. The 


Seleet ci 
Select C 2 
Select C 2 
Select Cl 


( 1 ) 
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META-DES framework is presented in the following sections. 
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3. The META-DES Framework 


The META-DES framework is based on the assumption that the dynamic ensemble selection 
problem can be considered as a meta-problem. This meta-problem uses different criteria regarding 
the behavior of a base classifier q, in order to decide whether it is competent enough to classify a 
given test sample xj. The meta-problem is defined as follows [4]: 

• The meta-classes of this meta-problem are either “competent” (1) or “incompetent” (0) to 
classify xj. 

• Each set of meta-features /* corresponds to a different criterion for measuring the level of 
competence of a base classifier. 

• The meta-features are encoded into a meta-features vector Vij. 

• A meta-classifier A is trained based on the meta-features Vij to predict whether or not Cj 
will achieve the correct prediction for Xj, i.e., if it is competent enough to classify xj 

A general overview of the META-DES framework is depicted in Eigure 3. It is divided into 
three phases: Overproduction, Meta-training and Generalization. 

3.1. Overproduction 

In this step, the pool of classifiers C = {ci,..., cm}, where M is the pool size, is generated 
using the training dataset T. The Bagging technique [26] is used in this work in order to build a 
diverse pool of classifiers. 

3.2. Meta-Training 

In this phase, the meta-features are computed and used to train the meta-classifier A. As shown 
in Eigure 3, the meta-training stage consists of three steps: sample selection, meta-features extrac¬ 
tion process and meta-training. A different dataset T\ is used in this phase to prevent overfitting. 

3.2.1. Sample selection 

We decided to focus the training of A on cases in which the extent of consensus of the pool 
is low. This decision was based on the observations made in [23, 6] the main issues in dynamic 
ensemble selection occur when classifying testing instances where the degree of consensus among 
the pool of classifiers is low, i.e., when the number of votes from the winning class is close to or 
even equal to the number of votes from the second class. We employ a sample selection mechanism 
based on a threshold he, called the consensus threshold. Eor each Xj^trainx ^ 7a, the degree of 
consensus of the pool, denoted by H (xj^trainx^C), is computed. If H {xj^trainx: C) falls below the 
threshold he, ^j,trainx is passed down to the meta-features extraction process. 
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Figure 3: Overview of the proposed framework. It is divided into three steps 1) Overproduction, where the pool of 
classifiers C = {ci,, cm} is generated, 2) The training of the selector A (meta-classifier), and 3) The generaliza¬ 
tion phase where the level of competence Sij of each base classifier Ci is calculated specifically for each new test 
sample Xj^test- Then, the level of competence Sij is used by the combination approach to predict the label wi of 
the test sample x.j^test. Three combination approaches are considered; Dynamic selection (META-DES.S), Dynamic 
weighting (META-DES.W) and Hybrid (META-DES.H). he, K, Kp and T are the hyper-parameters required by the 
proposed system. [Adapted from [4]]. 


3.2.2. Meta-feature extraction 

The first step in extraeting the meta-features involves eomputing the region of eompetence of 
^j,trains, denoted by 6j = {xi,..., Xi^}. The region of eompetenee is defined in the T\ set using 
the K-Nearest Neighbor algorithm. Then, 'x.j^trainx is transformed into an output profile, Xj^trainx- 
The output profile of the instance ^j^trainx is denoted by ^J^trainx {^j,trainx,l^ ^j,trainx, 2 ^ ••• 7 ^j,trainx,M }, 
where each ^j,trainx,i is the decision yielded by the base classifier q for the sample ^j,trainx [6]- 
The similarity between Xj^trainx ^^id the output profiles of the instances in 7a is obtained 
through the Euclidean distance. The most similar output profiles are selected to form the set 
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(j)j = {xi,..., xxp}, where eaeh output profile x^ is assoeiated with a label wi^k- Next, for eaeh 
base elassifier q e C, five sets of meta-features are ealeulated: 


• /i - Neighbors’ hard classification: First, a veetor with K elements is ereated. For eaeh 
sample x^., belonging to the region of eompetenee 9j, if c* eorreetly elassifies x^, the fc-th 
position of the veetor is set to 1, otherwise it is 0. Thus, K meta-features are eomputed. 

• f 2 - Posterior Probability: First, a veetor with K elements is created. Then, for each sample 
Xfc, belonging to the region of competence 9j, the posterior probability of q, P{wi \ x^) is 
computed and inserted into the fc-th position of the vector. Consequently, K meta-features 
are computed. 

• /a - Overall Local Accuracy: The accuracy of q over the whole region of competence 9j is 
computed and encoded as /a. 

• f 4 - Outputs’ profile classification: First, a vector with Kp elements is generated. Then, for 
each member x^. belonging to the set of output profiles cpj, if the label produced by q for x^ 
is equal to the label wi^k of Xfc, the /c-th position of the vector is set to 1, otherwise it is set to 
0. A total of Kp meta-features are extracted using output profiles. 

• /s - Classifier’s confidence: The perpendicular distance between the reference sample x^ 
and the decision boundary of the base classifier Cj is calculated and encoded as f^. 


A vector Vij = {/i U /2 U /s U /4 U /s} (Figure 4) is obtained at the end of the process. If 
Ci correctly classifies x^, the class attribute of Vij, aij = 1 (i.e., Vij belongs to the meta-class 
“competent”), otherwise aij = 0. Vij is stored in the meta-features dataset 7^* that is used to train 
the meta-classifier A. Figure 4 illustrates the format of the meta-features vector Vij. 


K features K features 1 feature 
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Figure 4: Feature Vector containing the meta-information about the behavior of a base classifier. A total of 5 different 
meta-features are considered. The size of the feature vector is (2 x K) -f Kp + 2. The class attribute indicates whether 
or not Ci correctly classified the input sample. 
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3.2.3. Training 

The last step of the meta-training phase is the training of the meta-elassifier A. In this work, we 
eonsidered a Naive Bayes for the meta-elassifier A, sinee this elassifier model presented the best 
elassifieation results for the META-DES framework when eompared against different elassifier 
models, sueh as a Multi-Eayer Pereeptron Neural Network and a Random Eorest elassifier [27]. 

3.3. Generalization 

Given the query sample ^j,test, the region of eompetenee 6j is eomputed using the samples 
from the dynamie seleetion dataset DSEL. Eollowing that, the output profiles test of the test 
sample, y^j^tesu are ealeulated. The set with Kp similar output profiles (pj, of the query sample 
Xj^test, is obtained through the Euelidean distanee applied over the output profiles of the dynamie 
seleetion dataset, DSEL. 

Eor eaeh base elassifier, c*, belonging to the pool of elassifiers, C, the five sets of meta-features 
are eomputed, returning the meta-features veetor Vij. Then, Vij is used as input to the meta- 
elassifier A. The support obtained by A for the “eompetent” meta-elass is eomputed as the level 
of eompetenee, 6ij, of the base elassifier q for the elassifieation of the test sample As 

in [27], we eonsider a hybrid eombination approaeh ealled META-DES .H. Eirst, the base elassi¬ 
fiers that aehieve a level of eompetenee > T = 0.5 are seleeted to eompose the ensemble 
C. Next, the deeision of eaeh seleeted base elassifier is weighted by its level of eompetenee.A 
weighted majority voting approaeh is used to prediet the label wi of the sample Thus, the 

deeisions obtained by the base elassifiers that attained a higher level of eompetenee dij have a 
greater influenee in the final deeision. 
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4. Why does the META-DES work: A Step-by-step example 

In this section, we present a step-by-step example of the training and test phases of the META- 
DES framework in order to understand the mechanisms behind the META-DES, and why it 
achieves good generalization performance using only linear classifiers. Eor this example, we use 
the P2 problem. 

4.1. The P2 Problem 

The P2 is a two-class problem, presented by Valentini [12], in which each class is defined 
in multiple decision regions delimited by polynomial and trigonometric functions (Equation 2). 
As in [13], EA was modified such that the area of each class was approximately equal. The P2 
problem is illustrated in Eigure 5. One can clearly see that it is impossible to solve this problem 
using linear classifiers. The performance of the best possible linear classifier is around 50%. 


E3{x) 


El(x) = sin(x) -f 5 

(2) 

E2(x) = (x-2f + l 

(3) 

= —0.1 ■ x^ + 0.6sin(4x) + 8 

(4) 

E4(x) = + 7.902 

(5) 
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Figure 5; The P2 Problem. The symbols 1 and II represents the area of the classes 1 and 2 respectively 

For this illustrative example, the P2 problem is generated as follows: 500 samples for training 
(T), 500 instanees for the meta-training dataset ( 7 a ), 500 instanees for the dynamie seleetion 
dataset DSEL, and 2000 samples for the test dataset, Q. For the sake of simplieity, we use a 
pool eomposed of 5 Pereeptrons. We demonstrate that using only 5 Pereeptrons it is possible to 
approximate the eomplex deeision boundary of the P2 problem using the META-DES framework. 

4.2. Overproduction 

Eigure 6 shows five Pereeptrons generated using the bagging teehnique for the P2 problem. 
The arrow in eaeh Pereeptron points to the region where the elassifier output is elass 1 (red eirele). 
Eigure 7 presents the deeision of eaeh Pereeptron individually. 
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Feature 2 



Feature 1 


Figure 6: Five Perceptxons trained for the P2 Problem. The bagging technique was used to generate the pool. The 
arrows in each Perceptron points to the region where the classifier output is class 1 (red circle). 
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Figure 7; Decision of each of the five Perceptions shown separately. The arrow in each Perception points to the region 
where the classifier output is class 1 (red circle). 
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The best elassifier of the pool (Single Best) aehieves an aeeuraey rate of 53.5% (ci). The 
performance of all other base classifiers is around the 50% mark. The Oracle result of this pool 
obtained a recognition rate of 99.5%. The Oracle is an abstract model defined in [28], which 
always selects the classifier that predicted the correct label, for the given query sample, if such a 
classifier exists. In other words, it represents the ideal classifier selection scheme. There is at least 
one base classifier that predicts the correct label for 99.5% of the test instances. The key issue is 
finding the right criteria to estimate the competence of the base classifiers in order to select only 
the competent ones. 

4.3. Meta-training: Sample Selection 

After generating the pool of classifiers C, the next step is the sample selection mechanism 
for training the meta-classifier. Figure 8 illustrates the effect of the sample selection mechanism. 
As in [4, 27] the consensus threshold he is set at 70%. (Figure 8 (a)) shows the original 7 a set 
before the sample selection. Figure 8 (b) shows the samples that were selected for training the 
meta-classifier. 



(a) The original 7a set (b) 7a after the sample selection mechanism 

Figure 8; (a) The original 7a dataset generated with 500 samples (250 for each class), (b) 7a after the sample selection 
mechanism was applied. 349 samples were selected 


The sample selection mechanism focuses on samples whose correct labels are harder to predict, 
i.e., when there is no consensus between the classifiers in the pool. Samples close to the decision 
boundary are the ones more likely to be selected for the training of the meta-classifier. This 
principle is similar to the support vectors in the SVM technique, in which samples close to the 
decision boundary are used as support vectors to achieve a better separation between classes. 
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In the META-DES framework, the samples close to the decision boundary are used to train the 
meta-classifier, while samples that are closer to the class mean are not used for training since 
the majority of base classifiers can correctly classify those samples. Only the samples shown in 
Eigure 8 (b) are passed down to the meta-features extraction process and are used for the training 
of the meta-classifier A. 

4.4. Classification 

To illustrate the classification steps of the system we consider five testing samples in different 
parts of the feature space. The coordinates of the each query instance are: xi = [0.2, 0.9], 
X 2 = [0.2, 0.1], X 3 = [0.5, 0.5], X 4 = [0.8, 0.7] and X 5 = [0.9, 0.85]. Eigure 9 illustrates the 
positions of the five testing samples. xi X 3 and X 5 belongs to class 1 , X 2 and X 4 belongs to class 2 . 

1 
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0 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Feature 1 



Figure 9: Five samples to be classified. xi X 3 and X 5 belonging to class 1, X 2 and X 4 belonging to class 2. 

In order to compute the region of competence and extract the meta-features for the given query 
sample, the dynamic selection dataset (DSEL) is used in the generalization phase. The dynamic 
selection dataset is shown in Eigure 10. 
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Figure 10: The dynamic selection dataset (DSEL) that is used to extract the meta-features. The set DSEL was 
generated with 500 samples, 250 for each class. 

As in our previous papers [4, 25], we consider the size of the region of competence K = 7, 
i.e., the seven nearest neighbors of the query sample, and the size of the output profiles set Kp = 5. 
Figure 11 shows the regions of competence of each training sample. The samples belonging to 
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the region of eompetenee 9j, defined using DSEL, are shown for eaeh testing sample separately 
(Figures 11 (b) to Figure 11 (f)). 

For eaeh test sample xj, five meta-feature veetors are extraeted, eaeh one eorresponding to 
the behavior of one base elassifier (ci to C 5 ) for the elassifieation of xj. Tables 1 to 5 present 
the meta-feature veetors obtained for eaeh test sample and base elassifier. For eaeh instance xj, 
we present the meta-feature vectors computed for each of the 5 base classifiers as well as the 
decision obtained by the meta-classifier, denoted by Sij. Sij = 1 means that the base classifier 
was considered competent, and was thus used to predict the label of the query sample. 
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(b) Neighborhood of Xi in DSEL 
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(c) Neighborhood of X 2 in DSEL 


(d) Neighborhood of X 3 in DSEL 
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(e) Neighborhood of X 4 in DSEL (f) Neighborhood of X 5 in DSEL 

Figure 11: Local regions computed using the K-Nearest Neighbor algorithm in the feature space. The region of 
competence of each testing sample is shown in one sub-figure 












For the sample xi (Table 1), it is an easier elassifieation ease sinee it is loeated elose to the 
mean of one of the elass eenters (wi). We can see in Figure 11 (b) that all instances in the region of 
competence of xi belong to the same class. The classifiers ci, cs and C 4 achieve a 100% recognition 
rate in the local region (as can be seen in Figure 7). This also holds true for the decision space, 
where those base classifiers present the correct label for the most similar output profiles as well. 
Thus it is clear that they are competent for the classification of xi. 


Table 1: Meta-Features extracted for the sample xi 
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/2 

h 
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fs 

kj 

Cl 

1 

1 

1 

1 

1 

1 

1 

0.65 

0.66 

0.59 

0.62 

0.66 

0.76 

0.61 

1.00 

1 

1 

1 

1 

1 

0.97 

1 

C2 

0 

0 

0 

0 

0 

0 

0 

0.39 

0.38 

0.31 

0.34 

0.38 

0.35 

0.06 

0.00 

0 

0 

0 

0 

0 

0.87 

0 

C3 

1 

1 

1 

1 

1 

1 

1 

0.84 

0.81 

0.77 

0.81 

0.82 

0.91 

0.82 

1.00 

1 

1 

1 

1 

1 

0.99 

1 

C4 

1 

1 

1 

1 

1 

1 

1 

0.79 

0.78 

0.73 

0.76 

0.79 

0.88 

0.77 

1.00 

1 

1 

1 

1 

1 

0.98 

1 

C5 

0 

0 

0 

0 

0 

0 

0 

0.30 

0.32 

0.24 

0.24 

0.29 

0.37 

0.23 

0.00 

0 

0 

0 

0 

0 

0.87 
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For the classification of the instance X 2 , we can see that it is located closer to the border 
separating the two classes. We can see that there are samples in the region of competence of 
X 2 belonging to both classes. The base classifiers that achieve a good performance considering 
both the validation samples in the region of competence 6j and the most similar output profiles, 
meta-feature f^, are considered competent. 


Table 2: Meta-Features extracted for the sample X 2 
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1 

1 

0 

0 

1 

1.00 
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0.89 
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1 
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1 
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1 

C2 
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1 

1 

0 

0 

1 
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0.32 

0.63 

0.62 
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0.39 

0.59 

0.57 

1 

0 

1 

1 

1 

0.97 

1 

C3 

1 

0 

1 

1 

0 

0 

1 

0.89 

0.11 

0.87 

0.87 

0.14 

0.17 

0.81 

0.57 

1 

0 

1 

1 

1 

0.99 

1 

C4 

1 

0 

1 

1 

0 

0 

1 

0.86 

0.13 

0.83 

0.85 

0.15 

0.18 

0.81 

0.57 

1 

0 

1 

1 

1 

0.99 

1 

C5 

0 

1 

0 

0 

1 

1 

0 

0.28 

0.70 

0.19 

0.20 

0.67 

0.79 

0.23 

0.43 

0 

1 

0 

0 

0 

0.87 

0 


The sample X 3 is located in a region close to the lines generated by the Perceptrons C 2 , C 3 , C 4 
and C 5 . However, all neighbor samples of X 3 belong to the same class. Thus, the classifiers that 
achieve a good performance in the region of competence 9 j , and also for the set (pj with the most 
the similar outputs profiles of X 3 , are selected. It is important to note that, in contrast to the testing 
instances xi and X 2 , we can see that both the posterior probability meta-feature, meta-feature / 2 , 
and the classifier’s confidence, meta-feature /s, produce lower results than the ones presented in 
Tables 1 and 2 since the samples are closer to the decision boundary of the base classifiers. 
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Table 3: Meta-Features extracted for the sample X 3 



FI 

F2 

F3 

F4 

F5 


Cl 

0 

0 

0 

0 

0 

0 

0 

0.12 
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0.13 

0.08 
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0 
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0.66 
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1 
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0 

0 

0 
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0.52 

0.43 

0.41 
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0.57 

0.45 
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0 

0 

0 

0 

0 
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C5 
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0 

0 
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0.47 
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0.43 
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0 

0 

0 

0 

1 

0.36 
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For the sample X 4 (Table 4), we ean see that the majority of its neighbor samples eome from a 
different elass (Figure 11 (d)). If we eonsider dynamie seleetion teehniques that are based solely 
on aeeuraey information, sueh as loeal classifier accuracy (LCA) [ 8 ] or overall classifier accuracy 
(OLA) [ 8 ], as well as the a priori and a posteriori methods [29], the base classifiers C 2 , C 3 and C 4 
are considered the most competent. So, using only the accuracy information in the local regions 
(region of competence) may not be sufficient to select the competent classifiers. However, these 
three classifiers predict the wrong label for X 4 ; as shown in Figure 7, they would predict that X 4 
belongs to class 1 (red circle). 


Table 4; Meta-Features extracted for the sample X 4 
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0.64 
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0.61 

0.37 

0.58 

0.57 

0 

0 

0 

1 

1 

0.90 

0 

C5 

1 

0 

1 

0 

0 

1 

0 

0.62 

0.37 

0.57 

0.31 

0.37 

0.72 

0.32 

0.43 

1 

1 

1 

0 

0 

0.89 
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Through the use of different meta-features, the META-DES is able to select a competent clas¬ 
sifier (ci) for the sample X 4 . The base classifier ci achieves a better performance in the decision 
space, (meta-feature f 4 ) (it is able to predict the correct class label for the closest samples in the 
decision space). Since each output profile x^ in the decision space is associated with a sample 
Xfc in the feature space, we present the most similar output profiles of the sample X 4 . We can see 
that computing the similarity using the decision space yields distinct results, i.e., different valida¬ 
tion samples are selected for extracting the meta-features. In this case, the closest output profiles, 
selected in the decision space, are from samples that belong to the same class of X 4 . So, the meta¬ 
features extracted using those samples are more likely to reflect the behavior of the base classifier 
Cl for the classification of the sample X 4 . In addition, the base classifier ci also presents a higher 
posterior probability for the correct class label (meta-feature / 2 ), and a higher confidence in its 
answer for the classification of the query sample X 4 (meta-feature /s) when compared to the other 
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base classifiers. Thus, it is considered as a competent classifier for the classification of the sample 

X4. 

It is important to mention that the base classifier C 5 also predicts the correct label for the sample 
X 4 . However, it was not considered as a competent classifier since it presented lower confidence 
in its prediction (meta-feature /s) as well as lower results for /2 when compared to ci. 


Table 5: Meta-Features extracted for the sample X 5 
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0.18 
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0.29 

0 

1 

0 

1 
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0.97 
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Considering these five testing samples, an interesting fact we can obtain from this example 
is the influence of using the decision space for estimating the competence of the base classifiers, 
especially considering the closest output profile (which holds the first position in the vector f^). 
Based on Tables 1 to 5, when the base classifier predicts the correct label for the closest (first) 
output profile of the query sample, the probability of the base classifier being selected as competent 
is high. 

Figure 12 illustrates the decision boundary obtained by the META-DES framework. Using 
only five linear weak classifiers and dynamic selection, we can approximate the complex decision 
boundary of the P2 problem. The methodology used to define the decision boundary obtained by 
the technique is presented in Appendix . 1. 
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Feature 1 


Figure 12: Decision Boundary obtained by the META-DES system using a pool of 5 Perceptrons. The META-DES 
achieves a recognition rate of 95.50% using 5 Perceptrons. 

When we apply statie eombination rules sueh as majority voting or Adaboost, the elassifieation 
aceuracy is mueh lower. Figure 13 illustrates the decision boundary obtained by static ensemble 
techniques using five Perceptron classifiers. We show the decisions obtained using the Average, 
Majority voting, Product, Maximum, as well as the Adaboost techniques. The average and product 
rules achieve a recognition rate of 47.5%, while the maximum and majority voting rules obtain an 
accuracy of 50%, and AdaBoost 56%. This can be explained by the fact that all classifiers in the 
pool are used to predict the label. However, due to the complexity of the problem, the degree of 
disagreement between the classifiers is very high. For the majority of test samples, half of the base 
classifiers disagree with the other half (predicts a different class label). The decisions of classifiers 
that are not experts for the local region end up negatively influencing the final decision. Thus, the 
static combination rule yields results that are close to random guessing. Even using techniques 
that assign weights to the base classifiers, such as Adaboost, we cannot approximate the complex 
decision of the P2 problem using only five linear classifiers. 
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Figure 13: Decision boundaries generated by each static combination method. The pool of classifiers is composed of 
the 5 Perceptrons presented in Figure 6. 25 























5. Further Analysis 


In this section, we evaluate the following aspects of the META-DES framework using the P2 
problem: 

1. The effect of the pool size on the classification accuracy. 

2. The effect of the size of the dynamic selection dataset (DSEE) on the classification perfor¬ 
mance of the system. 

3. The results of static the combination techniques for the P2 problem. This analysis is per¬ 
formed in order to provide an insight into why dynamic selection should be preferred for 
solving complex classification problems. 

4. The results of classical pattern recognition techniques such as Support Vector Machines and 
Random Eorest for the P2 problem. 

Eor the sake of simplicity, we use the same methodology used in the previous section: 500 
samples for training (T), 500 instances for the meta-training dataset (7a), 500 instances for the 
dynamic selection dataset DSEL, and 2000 samples for testing, Q. Eor each set, the prior prob¬ 
abilities of both classes are equal. Moreover, since the objective of this work is to study whether 
dynamic selection of linear classifier can solve complex non-linear classification problems, we 
also consider Decision Stumps [31] as base classifiers. We show that the META-DES framework 
works equally well using a pool of Decision Stumps. 

5.1. The Effect of the Pool Size 

Eor this experiment, we varied the size of the pool from 5 to 100 at 5 point intervals (20 re¬ 
sults are obtained). The size of the dynamic selection dataset (DSEE) was set at 500 (as shown 
in Eigure 10). The effect of the size of the pool of classifiers, M, is shown in Eigure 14. We can 
see that the size of the pool does not have a significant impact on the classification accuracy of 
the META-DES, especially when the Perceptron is considered as the base classifier. This finding 
can be explained by the fact that using only 5 base classifiers, the Oracle (ideal selection scheme) 
achieves a classification accuracy of 99.5% and 100% using Perceptrons and Decision Stumps, 
respectively. In other words, using five base classifiers, it is possible to represent the whole feature 
space. The key to having good classification performance lies in defining a criterion to select the 
best classifier(s) for any given test sample. An interesting point is that the performance using deci¬ 
sion stumps decreases as more classifiers are added to the pool, with the recognition performance 
decreasing when more than 25 base classifiers are used. Therefore, adding more classifiers does 
not always lead to higher classification accuracy. 
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Figure 14; The effect of the pool size, M in the classification accuracy. Perceptron and Decision Stumps are consid¬ 
ered as base classifiers. 

Figures 15 and 16 illustrate the deeision boundary obtained by the META-DES framework 
using Pereeptron and Deeision, respeetively, stump as base elassifier. We ean see that when only 
5 base elassifiers are used, the deeision boundary of the META-DES is elose to the real deeisions 
of the problem. 
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Figure 15: Decision boundaries generated by the META- 
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Figure 16: Decision boundaries generated by the META- 
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5.2. The effect of the size of the dynamic selection dataset (DSEL) 

Figure 17 shows the performance of the META-DES using both Perceptron and Decision 
stumps according to the DSEL size. We varied the size of the dynamic selection dataset from 
50 to 1000 at 50 point intervals (20 configurations were tested). The distribution varying the size 
of DSEL is presented in Appendix .4. For this experiment, the size of the pool was set at 100. 
We can observe that the size of the dynamic selection dataset has a greater influence on the clas¬ 
sification result. This can be explained by the fact that the dynamic selection dataset, DSEL, is 
used in estimating the competence of the base classifiers, as shown in the classification example 
(Section 4.4). With more samples in DSEL, the probability of selecting samples that are similar to 
the query sample both in the feature space or in the decision space for extracting the meta-features 
is higher. Hence, a better estimation of the competence of the base classifiers is achieved. 



Figure 17: The effect of the DSEL size in the classification accuracy. Perceptron and Decision Stumps are considered 
as base classifiers. The results are obtained using a pool with 100 base classifiers, M = 100. 

Moreover, to better understand the influence of both the size of the pool and the size of the 
dynamic selection dataset together, we constructed a 3D mesh plot showing the accuracy of the 
system according to both parameters (Eigures 18 and 19). 
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100 


Figure 18; The effect of the pool size and the validation set size (DSEL) in the accuracy of the system. Perceptrons 
are used as base classifier. 



100 


Figure 19; The effect of the pool size and the validation set size (DSEL) in the accuracy of the system. Decision 
Stumps are used as base classifier. 

5.3. Results of static combination techniques 

Figures 20 and 21 illustrate the accuracy rates of static combination techniques by varying the 
size of the pool of classifiers. Furthermore, the decision boundaries for the static combination 
techniques are shown in Figures 22 and 23 for Perceptrons and Decision Stumps, respectively. 
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Even when the size of the pool is increased to 100 base classifiers (Figure 22), the static 
combination techniques cannot approximate the decision of the P2 problem. The performance 
using Decision Stumps as base classifiers is significantly better than that using Perceptrons for 
the static combination rules, especially considering the AdaBoost technique. This fact can be 
explained by the divide-and-conquer approach of decision stumps, in which each Stump is trained 
using a single feature. Hence, the classification task may become easier for the classifier model. 
However, the classification accuracy is still far from the performance obtained by the META-DES 
framework. Even using only 5 base classifiers, the performance of the META-DES is superior 
when compared to static combination techniques using up to 100 base classifiers. 



Figure 20; Results of static combination techniques using Perception as base classifier. 
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Figure 21: Results of static combination techniques using Decision Stumps as base classifier. 
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Figure 22: Decision boundaries generated by each ense'Mtle method. The pool of classifiers is composed of 100 
Perceptron classihers. 
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Figure 23: Decision boundaries generated by each ense'^le method. The pool of classifiers is composed of 100 
Decision stumps classihers. 



























5.4. Single classifier models 

In this section, we show the results of classical classification models for the P2 problem. We 
evaluate three classifier models: MLP Neural Network, Support Vector Machines with Gaussian 
Kernel (SVM) and Random Forest classifier. These classifiers were selected based on a recent 
study [32] that ranked the best classification models in a comparison considering a total of 179 
classifiers over 121 classification datasets. All the classifiers were evaluated using the Matlab 
PRTOOLS toolbox [33]. The parameters of each classifier were set as follows: 

1. MLP Neural Network LM: The validation data was used to select the number of nodes in 
the hidden layer. We used a configuration with 100 neurons in the hidden layer. The train¬ 
ing process was performed using the Levenberg-Marquadt algorithm. The training process 
was stopped if its performance on the validation set decreased or failed to improve for five 
consecutive epochs. 

2. MLP Neural Network RPROP: The validation data was used to select the number of nodes in 
the hidden layer. We used a configuration with 100 neurons in the hidden layer. The train¬ 
ing process was performed using the Resilient Backpropagation algorithm [34] since this 
algorithm presented both a faster convergence and better classification performance in many 
applications [16]. The training process was stopped if its performance on the validation set 
decreased or failed to improve for five consecutive epochs. 

3. SVM: A radial basis SVM with a Gaussian Kernel was used. A grid search was performed 
in order to set the values of the regularization parameter c and the Kernel spread parameter 
7 - 

4. Random Forest: We vary the number of trees from 25 to 200 at 25 point intervals. The con¬ 
figuration with the highest performance on the validation dataset is used for generalization. 
Since there are only two features in the P2 problem, a decision stump is used (depth = 1). 

Since these classifiers do not require a meta-training stage, in these experiments, we merge the 
training (T) and meta-training set (7a) into a single training set, thereby training the classifiers 
with 1000 samples. The samples in the dynamic selection dataset (DSEL) are used for the vali¬ 
dation dataset. The decision boundary obtained by each classifier is presented in Figure 24. The 
MLP neural network trained with Levemberg-Marquadt obtained a recognition accuracy of 90%, 
while that trained with Resilient Backpropagation algorithm obtained 77%. The SVM obtained 
a recognition accuracy of 93%, and the random forest classifier achieved 91%. The classification 
accuracy of these single classifier models is lower than the performance of the META-DES using 
a pool of either five Perceptrons or five Decision Stumps. This result can be explained by the 
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complex nature of the P2 problem. It is diffieult to properly train a strong elassifier to learn the 
separation between the two classes. These classifiers might require more training samples in order 
to obtain better generalization performanee. 
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Figure 24; Decision boundaries obtained using a single classifier, (a) MLP-NN with 100 neurons in the hidden layer 
trained using Levemberg-Marquadt (90% accuracy), (b) MLP-NN with 100 neurons in the hidden layer trained using 
Resilient Backpropagation (77% accuracy), (c) Random Forest classiher (91% accuracy). Support Vector Machine 
with a Gaussian kernel (d) (93% accuracy). 
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6. Conclusion 


In this work, we perform a DEEP analysis of the META-DES framework using linear elassi- 
fiers. The analysis is eondueted using the P2 problem, whieh is a eomplex non-linear problem with 
two elasses having multiple elass eenters. We demonstrate that using the META-DES framework, 
we ean approximate the eomplex non-linear distribution of the P2 problem using few linear elas- 
sifiers. The aeeuraey rate provided by the best linear elassifiers trained for this problem is around 
50%. We demonstrate that using statie eombination teehniques, it is impossible to approximate the 
eomplex deeision frontier of the P2 problem. Beeause of the eomplex nature of the P2 problem, for 
every test sample, there is high disagreement between the predietions made by the base elassifier. 
Sinee there is no eonsensus regarding the eorreet label for the test sample, the statie eombination 
teehniques end up making random deeisions. Even using teehniques that assign weights to the 
base elassifiers, sueh as AdaBoost, the elassifieation aeeuraey using 100 base elassifiers is still 
very different from the performanee of the META-DES framework. Classifiers that are not experts 
in the loeal region where the query instanee is loeated end up negatively influeneing the deeision 
of the system. Using dynamie seleetion, the deeisions of the base elassifiers that are not experts 
for the given query sample are not taken into aeeount. Only the most eompetent elassifiers are 
seleeted to the prediet the label of the query sample. 

The size of the pool of elassifiers did not have a significant influence on the recognition rate. 
This finding can be explained by the fact that using only 5 base classifiers, the Oracle performance 
of the Pool is at 100%. In other words, there is at least one base classifier that predicts the correct 
class label for every testing sample. The crucial element here is the criteria used to estimate the 
level of competence of the base classifiers in order to always select those that predict the correct 
class label for a given test sample. Moreover, we noticed a performance drop when using decision 
stumps as base classifiers when more than 25 base classifiers are used. These results indicate that 
increasing the number of base classifiers in the pool does not always lead to greater classification 
accuracy. Thus, one aspect of the framework that must be further investigated is how many base 
classifiers should be trained in the overproduction phase for a given classification problem. 

We evaluate the impact of the pool of classifiers and the size of the dynamic selection dataset 
(DSEE) that is used in dynamically estimating the level of competence of the base classifier. Ex¬ 
perimental results show that the size of the dynamic selection dataset has a higher impact on 
classification performance. This can be explained by the fact the majority of the meta-features 
proposed for the META-DES framework are extracted from instances in DSEE that are similar to 
the query sample, considering both the feature space and the decision space. With more samples 
in the DSEE, the probability of selecting samples that are similar to the query sample in both the 
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feature spaee and in the deeision spaee for extraeting the meta-features is higher. Henee, a better 
estimation of the eompetenee of the base elassifiers is aehieved. The results found in this analysis 
should be eonsidered as a guideline for future work on the META-DES and for other dynamie 
ensemble seleetion based on loeal aeeuraey information in general. 

Eurthermore, the META-DES framework presented a higher elassifieation aeeuraey for the P2 
problem than did the elassieal single elassifier model. This finding may be attributed to the eom- 
plex nature of the P2 problem, sinee a elassifier sueh as an SVM or an MEP neural network may 
require more training samples for a better generalization performanee. Using dynamie seleetion 
through the META-DES framework we ean approximate the eomplex deeision of the P2 problem 
using less training data. 

It is important to mention that there is still room for improvement in the META-DES frame¬ 
work. Using five base elassifiers, the aeeuraey rate obtained by the META-DES is around 95%, 
while the Oraele performanee is elose to 100%. Euture works will involve the definition of new 
meta-features in order to aehieve a behavior that is eloser to the ideal dynamie seleetion teehnique 
(Oraele). 


39 



Appendix .1. Plotting decision boundaries 

When dealing with dynamic classifier or ensemble selection, for each classification sample 
^j^tesu a specific ensemble or base classifier is selected to perform the classification. Thus, a grid 
is generated over the 2D image. The grid is generated in the same interval as the P2 classification 
problem [0, 1] for both axes. Each point on the 2D grid is passed down to the dynamic selection 
technique in order to predict its label. After every point on the 2D grid is evaluated, the MATLAB 
contour plot is used to separate the points that were classified between the two classes. It is 
important to mention that the number of points on the grid influences the definition of the decision 
boundary. A high number of points in the grid leads to a more precise decision boundary. In our 
experiment, we use a 100 x 100 grid, for a total of 10,000 points, in order to have a more precise 
decision boundary map. For the static combination rules and classification models the decision 
boundaries are plotted using the plotc function from the PRTOOLS Matlab Toolbox [33]. 

Appendix .2. Ensemble Generation 

Figures .25 and .26 illustrate the pool of classifiers generated with bagging using Perceptrons 
and Decision Stumps, respectively. We consider a pool of 5,10, 25, 50,75 and 100 base classifiers. 
Considering a pool size of 100 classifiers, we can see that most of the classifiers are in the same 
region. Thus, we believe the majority of classifiers are redundant. This can be explained by the 
fact we used bagging for the generation of the pool. In the bagging technique, the bootstraps are 
randomly taken from the training data, and such, there is no guarantee that a high diversity pool 
will be achieved. The use of techniques such as the Random Oracle [35], may be considered in 
the future as an alternative for the generation of the pool in order to achieve higher diversity at the 
pool level. 

Appendix .3. Sample Selection Mechanism: consensus threshold he 

In this section, we show the results of the sample selection mechanism by varying the value 
of the threshold he. Since the sample selection mechanism depends on the base classifier (i.e., 
the consensus among the pool), we show the result of the sample selection mechanism using both 
Perceptrons and Decision Stumps Figures .27 and .28 respectively. 

Samples close to the decision boundary are the ones more likely to be selected for the training 
of the meta-classifier. Hence, the sample selection mechanism focuses on samples that are close 
to the decision boundaries thus, are harder to predict its correct label. This principle is similar 
to the support vectors in the SVM, where samples close to the decision boundaries are used to 
achieve the best separating hyperplanes. In our case, the samples close to the decision boundary 
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are used to train the meta-elassifier in order to distinguish between a eompetent elassifier from an 
incompetent one in cases where a disagreement exists between the base classifiers in the pool. 

Appendix .4. Size of the dynamic selection dataset (DSEL) 

Figure .25 shows the dynamic selection dataset (DSEL) generated with different sizes. The 
figures show the exact distributions of the dataset DSEL used to evaluate the performance of the 
META-DES framework according to its size (Section 5.2). The size of the DSEL has a significant 
impact on the performance of the META-DES framework 17. This can be explained by the fact 
the meta-features are extracted based on the neighborhood of the query sample 'x.j^test projected in 
DSEL. 
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Figure .25: Base classifiers generated during the overproduction phase. The Bagging technique is used to generate 
the pool of classifiers. 
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Figure .26; Decision Stumps classifiers generated during the overproduction phase. The Bagging technique is used to 
generate the pool of classifiers. 
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Figure .27: Meta-training dataset T\ after the sample selection mechanism is applied. A pool composed of 100 
Perceptrons is used. 
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Figure .28: Meta-training dataset T\ after the sample selection mechanism is applied. A pool composed of 100 
Decision Stumps is used. 
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Eigure .29; Distributions of the dynamic selection datas^JQjvalidation), used to extract the meta-features during the 
generalization phase of the system. 
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